Word Clustering for Collocation-Based Word Sense Disambiguation
نویسندگان
چکیده
The main disadvantage of collocation-based word sense disambiguation is that the recall is low, with relatively high precision. How to improve the recall without decrease the precision? In this paper, we investigate a word-class approach to extend the collocation list which is constructed from the manually sense-tagged corpus. But the word classes are obtained from a larger scale corpus which is not sense tagged. The experiment results have shown that the Fmeasure is improved to 71% compared to 54% of the baseline system where the word-class is not considered, although the precision decreases slightly. Further study discovers the relationship between the F-measure and the number of word-class trained from the various sizes of corpus.
منابع مشابه
Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation
In this paper a novel solution to automatic and unsupervised word sense induction (WSI) is introduced. It represents an instantiation of the ‘one sense per collocation’ observation (Gale et al., 1992). Like most existing approaches it utilizes clustering of word co-occurrences. This approach differs from other approaches to WSI in that it enhances the effect of the one sense per collocation obs...
متن کاملKim, Su Nam and Timothy Baldwin (to appear) Word Sense Disambiguation and Noun Compounds, ACM Transactions on Speech and Language Processing
In this paper, we investigate word sense distributions in noun compounds (NCs). Our primary goal is to disambiguate the word sense of component words in NCs, based on investigation of “semantic collocation” between them. We use sense collocation and lexical substitution to build supervised and unsupervised word sense disambiguation (WSD) classifiers, and show our unsupervised learner to be supe...
متن کاملThe Sense Boundary Decision and the Sense Labeling from Collocation Clustering
This paper discusses the deciding practical sense boundary of homonymous words. One of the serious problems in making dictionaries or thesauri is in the vague boundary of senses. This also becomes a bottleneck in sense disambiguation for practical language processing systems. This paper proposes a deciding method for sense boundary discovery of homonyms using collocation from large corpora and ...
متن کاملOne Sense per Collocation and Genre/Topic Variations
This paper revisits the one sense per collocation hypothesis using fine-grained sense distinctions and two different corpora. We show that the hypothesis is weaker for fine-grained sense distinctions (70% vs. 99% reported earlier on 2-way ambiguities). We also show that one sense per collocation does hold across corpora, but that collocations vary from one corpus to the other, following genre a...
متن کاملIntegrating Collocation Features in Chinese Word Sense Disambiguation
The selection of features is critical in providing discriminative information for classifiers in Word Sense Disambiguation (WSD). Uninformative features will degrade the performance of classifiers. Based on the strong evidence that an ambiguous word expresses a unique sense in a given collocation, this paper reports our experiments on automatic WSD using collocation as local features based on t...
متن کامل